Marine ecosystem shifts with deglacial sea-ice loss inferred from ancient DNA shotgun sequencing

Sea ice is a key factor for the functioning and services provided by polar marine ecosystems. However, ecosystem responses to sea-ice loss are largely unknown because time-series data are lacking. Here, we use shotgun metagenomics of marine sedimentary ancient DNA off Kamchatka (Western Bering Sea) covering the last ~20,000 years. We traced shifts from a sea ice-adapted late-glacial ecosystem, characterized by diatoms, copepods, and codfish to an ice-free Holocene characterized by cyanobacteria, salmon, and herring. By providing information about marine ecosystem dynamics across a broad taxonomic spectrum, our data show that ancient DNA will be an important new tool in identifying long-term ecosystem responses to climate transitions for improvements of ocean and cryosphere risk assessments. We conclude that continuing sea-ice decline on the northern Bering Sea shelf might impact on carbon export and disrupt benthic food supply and could allow for a northward expansion of salmon and Pacific herring.

L55: "Our approach includes a broad spectrum of taxonomic and functional groups" -this sentence is unclear; change to "our metagenomic/shotgun approach enables the investigation of" Results: This section needs some focussing. In the current form there are several occasions where parts that should be in the Introduction or Discussion are intermingled with the description of the data and figures/tables, which makes this section difficult to follow. For example, there are discussion points in L179,. Pelagic and Benthic profiles; these are currently part of the sections "Taxonomic assignments to potentially pelagic organisms are dominated by primary producers and fish" and "Benthic sedaDNA sources and network dynamics". While SST and IP25 measurements are shown as part of the figures, a more detailed description of the changes that occurred with varying SST/IP25 or on a temporal scale, i.e.. in relation to the deglacial/sea-ice loss through time, could be included.
L122: Environmental preference of pelagic taxa inferred by network analysis; this is a lengthy section, consider introducing subheadings to describe the sea-ice and ice-covered networks.
L130: "decadal to centennial": You might have done this already, but have a good overview about sedimentation rates and that ~780 years lie between each depth sampled, based on this could you narrow down the estimate how many years would be covered within your sedaDNA sub-sample?
L150: Are you able to investigate these reads to see if they might be one of those two suspected species? Futhermore, the authors go beyond listing taxa and provide an elegant and straightforward statistical analysis of their molecular data against environmental variables by using correlation networks (please see comments below). Altogether, their novel approach allows them to measure the effects of (longterm) climate change on important ocean processes like primary production and food-web interactions at an unprecedented level of detail and taxonomic breadth. The relevance of this work as 'analogous' to ongoing sea-ice loss trends in subarctic coastal environments that is discussed by the authors in Discussion could be emphasized in the introduction already by describing how the 'seasonal sea-ice' to 'see-ice free' conditions over the period studied have changed and relate to documented ongoing and expected e.g., IPCC trends for relevant regions.
4. Data and methodology: Your assessment of the validity of the approach, the quality of the data, and the quality of presentation. We ask reviewers to assess all data, including those provided as supplementary information. If any aspect of the data is outside the scope of your expertise, please note this in your report or in the comments to the editor. We may, on a case-by-case basis, ask reviewers to check code provided by the authors (see this Nature editorial for more information).
Reviewers have the right to view the data and code that underlie the work if it would help in the evaluation, even if these have not been provided with the submission (see this Nature editorial). If essential data are not available, please contact the editor to obtain them before submitting the report.

MY COMMENTS:
Outliers: At least three (1.08, 5.6 and 9.84, cal ka yrs) of the samples presented seem to be outliers in the dataset (own branches on dendrogram Figs 2 and 4, very distant from neighbours), and I suspect the low raw, classified pre-and post-read counts exclusion of taxa and resampling may explain this as an artifact, rather than a true signal. This could be showed by presenting e.g., a PCoA of the samples (with and without controls) with a biplot of the main taxa before and after exclusion of taxa in Suppl. I think removing them would change correlations and interpretations slightly (likely strengthen them), and help present the counts from the Figures 2 and 4 on clearer scales. Please provided the data used to plot Suppl Fig 1 in table form, and please also provide read counts before and after exclusion of taxa (described on l. 409-424).
All figure captions and tables titles: specify whether reads presented are rarefied or not. Fig. 2: Please consider using the same scale for rel. abundances. At present it is difficult to see groups most recorded and transitions. Also, as they are re-sampled and not actual read-counts, I suggest presenting into percentages for ease of comparisons, also to be more consistent with numbers reported in text.
Figs 2 and 4: Boxes for the ice-free ecosystem are different, which one is correct (Fig. 4?)?
l. 83: 72.5% of classified read pairs overlapped and were merged. Is the total read number on l.76 including overlapping read pairs (reads from 'the same DNA strand' counted twice?). Please clarify this elsewhere also, it's important that the merged reads are not counted twice. l. 139: Consider marking the Families (significantly) positively correlated to SST and IP25 in red and blue in Figs 2 and 4 for ease of interpretation. l. 142: Add climatic periods mentioned in test to Figs 2 and 4. l. 190,Fig. 3: Nice figure, well thought out and functional groups easy to distinguish. Please add legend for equivalent % for size of nodes (same applies to other network provided in suppl. materials). Add explanation to caption for labelling of nodes or not (some nodes don't have labels, likely because they have too low %). Some family names are difficult to read and associate to the nodes, consider placing them more off from one another with black lines to the nodes, include fewer names or replace with images of taxa as per Fig 5 where appropriate. l. 322 The chronology was reported in a 2012 publication; please indicate whether the same, 2009 core was used, and if so, indicate the date or year and procedure for the splitting of the core that was used for the DNA work. Cores stored unopened vs opened for many years prior to DNA sub-sampling may be differently affected by secondary growth of e.g., fungi, yeast and Bacteria that would influence DNA results (as per stated by the authors). If different cores were used, please describe the splitting procedure and how those were correlated for the age-depth model. l. 327 As the lab facilities used for sub-sampling are not a dedicated clean facilities, please include the precautions that were undertaken to limit contaminants (air, surfaces, tools) from the wide range of organisms studied at GEOMAR, like fish and phytoplankton. l. 345: Please clarify in text whether the same extracts as ref 15 were used (I assume this is the case, as the same extract controls are analysed), or whether new extracts were made from the samples. If the same extracts were used, please clarify that the protocol is exactly as per ref 15 from the first sentence of this paragraph. l. 376: Were the controls also provided with 10-14 cycles? This may explain why such few reads could be sequenced.
l.390. A 5% replication rate seems impressively low; is this in a similar range from other sedaDNA studies? Could the authors provide the distribution from Suppl. Fig. 1  l. 409-424: I think the approach for inclusion/exclusion of taxa is reasonable, however it would be useful to include the read counts that were removed by this 'exclusion step' from the initial read count at the classification at Family level step. Pls see additional comment on damage patterns. l. 425 Though I acknowledge that similar threshold approaches have been used on similar datasets and at least partly agree with the reasoning behind it, keeping 'only families that occurred in at least 3 samples with at least 10 counts' prior to re-sampling (or other form of normalization) increases the risk of removing true positives, rare taxa that may by chance (or because of sediment matrix effects on library building success) occur only in shallower sampled intervals of the core. Could the authors comment on this effect in relation to their data?
Negative controls: It is good practice that the authors included a number of negative controls from the extraction step in the sequencing, however the number of reads from them is quite low (total of ~500 classified reads from all controls combined). It is actually common for controls to yield relatively high levels of reads assigned compared to samples, because human-related taxa are overall more commonly sequenced and thus more represented in reference databases (erroneously or not), and it would be also important to know many raw reads were sequenced for controls (maybe I have missed this information). Please provide the read counts after the different bioinformatic steps for the controls as per the samples. Ideally, one would provide similar sequencing effort to controls as for samples, however such a standard has not yet been established in the field and 'best practice' approaches from previous works have varied, none reaching this ideal (to my knowledge). Furthermore, it can be challenging to obtain sufficient amounts of reads from controls for sequencing, that would indeed contain very little DNA (increasing the number of indexing PCR amplification cycles usually helps, but also result in high levels of duplication). That being said, I do not have a reason to dispute the authenticity of the results provided for the main taxa detected supporting the MS conclusions other than the relatively minor verifications I have pointed out elsewhere, particularly if the damage patterns from at least some of the representants of major groups of organisms can be established. l. 444: Insufficient number of reads assigned were found for the damage pattern analyses, however pelagic and benthic merged and unmerged counts sum up to potentially higher reads counts than merged and unmerged read counts for some samples. Please clarify. 5. Analytical approach: Your assessment of the strength of the analytical approach, including the validity and comprehensiveness of any statistical tests. If any aspect of the analytical approach is outside the scope of your expertise, please note this in your report or in the comments to the editor.

MY COMMENTS:
I'm not very familiar with correlation networks, but I find the use here to be useful in highlighting differences in community assemblages and function under different climatic regimes. I would like for the authors to address or comment on a couple of questions that would relate to the structure of ancient metagenomic datasets could be addressed in the MS: What is the sensitivity of correlation networks to detect non-linear relationships between taxa, especially over centennial to millennial timescales? Do biogeographical distribution patterns matter on this scale? How is this statistical approach handling the zero-inflated dataset? Trends in the dataset (vs environmental variables) could perhaps be further explored with GAMs (e.g., Simpson, G. L. Modelling palaeoecological time series using generalised additive models. Frontiers in Ecology and Evolution 149 (2018). To what extent do the correlations between families may be affected by reference bias within closely related taxa, where for instance, only a few members of a family with differential degree of coverage may be available for mapping in NCBI? How could this affect interpretations (e.g., l. 155)?
Could the author also comment on the potential impact of downcore autocorrelation between samples for correlation networks and overall correlations with environmental variables to strengthen the conclusions?
All relative abundance datasets suffer from closed sum effects. To further support one of the main conclusions of the MS about the change between smaller and larger celled primary producers from the pelagic environment, the authors could test the strength of the correlation between the ratio of read abundances of small:large phytoplankton and their environmental variables.
6. Suggested improvements: Your suggestions for additional experiments or data that could help strengthen the work and make it suitable for publication in the journal. Suggestions should be limited to the present scope of the manuscript; that is, they should only include what can be reasonably addressed in a revision and exclude what would significantly change the scope of the work. The editor will assess all the suggestions received and provide additional guidance to the authors.

MY COMMENTS:
I have already provided suggestions in previous sections that would, in my opinion, strengthen the manuscript, and added a few more below.
A major point: damage patterns were assessed for diatoms, but overall conclusions would be stronger if the other major groups found were also assessed like fish (codfish, salmon, herring), cyanobacteria and copepods (the latter may have too low counts). Cnidaria, Echinodermata, Mollusca also had potentially sufficiently high number of reads. Where counts are of sufficient abundance, consider also assessing the damage patterns for some of the main groups that were excluded to show the difference in damage. Because it is particularly prone to reference bias because of its high representation in NCBI as a model organism, to further validate the signal of Salmon (vs that of other fish), the authors could consider providing information on whether reads fall within unique parts of its genome (vs other fish).
Presenting rarefaction curves for each sample and for both pelagic and benthic groups in supplementary would help assessing sampling completeness at the Family level. 7. Clarity and context: Your view on the clarity and accessibility of the text, and whether the results have been provided with sufficient context and consideration of previous work. Note that we are not asking for you to comment on language issues such as spelling or grammatical mistakes.

MY COMMENTS:
The manuscript was a nice read, and I have made only a few suggestions that may improve clarity and context for the wider readership.
Intro: Consider providing a bit more context to support the validity of 'climate transition' and the analogous environments compared, by establishing the current, historical and paleo sea-ice conditions, and those expected under future scenarios more clearly (e.g., how many months of sea ice? Extent? spatial patterns?). How do the 'the phases of the Bølling-Allerød and the Holocene' compare with now/expected scenario in relevant regions? This relates to my last comment in the significance section.
l. 26: This sentence: 'This demonstrates the need for a better understanding of ecosystem changes during the transition from seasonal sea-ice to ice-free conditions due to ongoing climate warming.' I'm actually not certain that 'it' does. Perhaps restructure this part of the intro for better flow.
Minor comments: l. 4: 'service' is phrased wrong, should be 'services delivered by' or something along those lines. l. 20. 'timing', do you mean seasonal timing? l. 21: 'play a central role in trophic interactions' could be more specific by adding in what way they play a central role e.g., 'supporting foodwebs as primary producers' or food source etc.
l. 22: Try to be more specific for the wide readership: 'air temperatures' I assume could be written as e.g., 'high air temperatures' or 'record high air temperatures' if that applies? l. 24-26: What were those shifts? Could be more specific here. l. 32: Add 'latest' IPCC report l. 216: Please add the taxa groups names (not only common names), as per above in text. l. 233: specify whether observations were from paleo or modern datasets. 8. References: Your view on whether the manuscript references previous literature appropriately.

MY COMMENTS:
References seemed appropriate and up to date. 9. Your expertise: Please indicate any particular part of the manuscript, data or analyses that you feel is outside the scope of your expertise, or that you were unable to assess fully.

MY COMMENTS:
My expertise lies in paleoecology of Arctic freshwater ecosystems and foodwebs, paleoclimatology as well as shotgun metagenomics, and is therefore more limited for marine environments and organisms beyond characteristics that would apply to aquatic ecosystems more broadly, like major ecological functional groups and foodweb interactions.

Alexandra Rouillard
Reviewer #3 (Remarks to the Author): The manuscript entitled: "Marine ecosystem shifts with deglacial sea-ice loss inferred from ancient DNA shotgun sequencing" suggest that the compositions and structure of a marine ecosystem is different between a period with and without sea ice in the Bearing Sea by looking at fossil record data.
I found the study interesting, but I had concerns about the methodology used especially with regards to the statistical analyses performed in the study. I listed my concerns about the statistical analyses below.
1. L474-476: "As for both variables the first sample age as well as the last two for SST and the last four for IP_25 data could not be interpolated, we calculated the mean of the first and last three interpolated values and replaced the missing data." Why was it not possible to interpolate these samples? Please give detailed explanations.
Also, by using "…the mean of the first and last three interpolated values…" as a way to impute the missing data, the imputed missing values could lead to misinterpretation of the results. This is especially true because there are so few samples in the data. To put it differently, for IP_25, 20% of the data (5 out of 25 samples) are imputed. In this context, using an imputation approach that accounts for temporal autocorrelation (e.g. an Autoregressive integrated moving average [ARIMA] model) would be much more appropriate. In addition, the model used for the imputation should be validated using SST and IP_25 data gathered elsewhere to check if the trend makes sense.
2. Currently, the analysis carried out suppose that each sample is completely independent from the others. In other words, it assumes that there are no temporal autocorrelations in the data. A test of temporal autocorrelation in the taxon data should be carried out to make sure the interpretation is not affected by temporal autocorrelation. If there is temporal autocorrelation in the data, it will need to be accounted for in the analysis and interpretation. If there is no temporal autocorrelation in the taxon data, it would be an interesting addition to the results that is valuable ecologically because it would refine the understanding gain from this system (e.g. a deeper understand of temporal turnover, or temporal beta diversity, could be assessed). There exist a few options to test for temporal autocorrelation for multivariate data, two that come to mind is to use the approach proposed by Legendre andGauthier (2013) or Dray et al. (2008) (both proposed techniques have been implemented in the adespatial R package). Another approach is to test the temporal autocorrelation for each taxon independently, this could be done, e.g. with a Moran's I (Moran 1948) or an ARIMA model.
3. L463-466: "We calculated pairwise Spearman rank correlation coefficients (rho) using corr.test including Benjamini-Hochberg correction for multiple testing from the psych package v. 2.0.1291, and those exceeding at least 0.4 (adjusted p-value < 0.05) were used for undirected network generation by the igraph package v. 1.2.692 without self-loops and isolated nodes." Technically, building the network based on Spearman correlation is problematic because it does not explicitly account for the multivariate nature of the data and as such it is likely to lead to biased results and as such wrong interpretations. In this respect, the approach proposed by Popovic et al. (2019) is much better adapted to the question and data of the present study.
4. L467-468: "We included only positive correlations as negative correlations were rare and did not affect the structure of the network." Negative correlations should be presented and discussed. They present valuable information that could lead to a better understanding of the studied system. Reviewer Comment: L12/13: "continuing sea-ice decline on the northern Bering Sea shelf might pose a risk for changes in carbon export and benthic food supply" -simplify this sentence, eg.: might impact on/reduce carbon export and disrupt benthic …? Our response: Changed accordingly. Revised text L32-34: "We conclude that continuing sea-ice decline on the northern Bering Sea shelf might impact on carbon export and disrupt benthic food supply and could allow for a northward expansion of salmon and Pacific herring." Introduction: Reviewer Comment: The introduction touches on most of the concepts of the study that follows, however, it is missing some background information on the network analysis that follows, which is instead introduced in the results (L123 following), on the characteristics of pelagic and benthic communities (treated separately in the study) and environmental proxies such as IP25 and SSTUK37. All of these aspects are important parts of the study, however, are not introduced/mentioned in the Introduction (currently focussing on food web structure, short-term records and metagenomics). Our response: We included background information on (1) network analysis, (2) pelagic and benthic taxa and (3) environmental proxies in the introduction. The introductory parts about the network analysis were moved from the material and methods sections to the introduction as well as the interpretations of the IP25 and SST proxies. Revised text to include networks in the introduction L87-103: "Correlation networks have been used in the interpretation of multidimensional data by means of co-occurrence networks and contributed to our understanding of ecosystems by identifying, for example, habitat preferences in aquatic bacterial communities 23 or geographic co-occurrence patterns 24 . The sparsity of metagenomic datasets resulting from either true absences or insufficient sequencing depth can lead to spurious correlations 25 . Gaussian copula graphical models (GCGMs) are a novel statistical framework developed to separate environmental effects from intrinsic interactions among taxa 26 . They are suited to address the compositional structure of sequencing data, non-linear correlations, and to remove spurious correlations via mediator taxa (if a correlation between two taxa arises only because both are correlated to a third taxon) or via similar responses to environmental effects (covariance). Cooccurrence networks have not yet been applied to sedaDNA data before. As sedaDNA samples contain averaged information on decadal to centennial time-scale (here on average 15.5 years during the late glacial and 28.7 years during the Holocene in a centimeter thick sample), correlation in the network should not be interpreted as ecological interactions between linked taxa or actual co-occurrence on a short-time scale. We here assume that positively correlated families show similar responses to environmental changes (i.e., sea ice coverage and SSTs)." L201-202: "To identify taxa that show a similar response to sea ice loss, we used cooccurrence networks and compared Spearman networks with GCGMs." Included citations: 23.
Revised text pelagic and benthic taxa in the introduction L40-43: "This will likely change the seasonal timing, biomass, and composition of algal blooms, which play a central role in trophic interactions by supporting food webs, including benthic communities as primary producers 2 ." L 63-65: "While pelagic communities are strongly linked to hydrographic factors and climate, benthic communities are strongly dependent on primary production and sinking particles from the water column above 9 ." L 55-58: "Despite extensive monitoring efforts, we still have scarce knowledge on long-term ecosystem responses to climate transitions for many taxonomic groups, particularly zooplankton, fish and benthic organisms such as clams, tunicates, starfish or macrophytes." L 60-63: "The short-term responses of the Bering Sea ecosystem to warmer and colder regimes have been documented well 2,5,8 , yet long-term rearrangements of the organismal composition in pelagic and benthic communities are not clear and more data are needed to constrain the future ecosystem development." Revised text environmental proxies in the introduction L 69-75: "Marine sediments are a natural archive of climate history from which sea ice can be reconstructed via proxy records, such as from biomarkers 10 or microfossil remains 11 . A regional palaeoceanographic framework for past sea ice coverage in the Bering Sea is based on reconstructions using the highly branched isoprenoid alkane biomarker IP25 12 , produced by diatoms bound to a life in seasonal sea ice 10 , and diatom microfossils 11 . Alkenone-based sea surface temperatures (SSTUk'37), which are produced by haptophytes, and can add complementary information about SSTs of the late summer/early fall season 13 ." L 108-113: "The paleoenvironmental backbone is based on previous multi-proxy reconstructions of the coring site derived from this core, including diatom microfossils and the biomarker IP25, which show that the coring site was covered by seasonal sea-ice during the Last Glacial Maximum (LGM), Heinrich Stadial 1, and the Younger Dryas, while the phases of the Bølling-Allerød and the Holocene were predominantly sea-ice free 12,16 . " Citations: 9.

12.
Méheust, M., Stein, R., Fahl, K. & Gersonde, R. Sea-ice variability in the subarctic North Pacific and adjacent Bering Sea during the past 25 ka: new insights from IP25 and Reviewer Comment: L17: This sentence should be rephrased, as not all organisms can be considered a carbon sink, there are many organisms that respire, too, and so might be considered a carbon source. Our response: We rephrased the sentence. Revised text L 37-40: "The organismal composition in high-latitudinal oceans is highly vulnerable to anthropogenic global warming, which may alter ecosystem services (e.g., food supply, biological carbon pump) due to sea-ice loss and related effects on rising ocean temperature, increasing light transmission, stronger water column stratification and decreasing nutrient availability 1 ." Reviewer Comment: L19: Be more specific on how exactly sea-ice loss effects these following parameters that you list, not just that they are related. Our response: We added the information. Revised text L 37-40: "The organismal composition in high-latitudinal oceans is highly vulnerable to anthropogenic global warming, which may alter ecosystem services (e.g., food supply, biological carbon pump) due to sea-ice loss and related effects on rising ocean temperature, increasing light transmission, stronger water column stratification and decreasing nutrient availability 1 " Reviewer Comment: L30: I'm not sure if protists are the best example here, as there are many studies on nano-and microfossil assemblages which would have all been protists at the time they were alive. You could refer to organisms that are not normally part of conventional studies, in your study that may be copepods and fish. Our response: Changed accordingly. Revised text L 55-58: "Despite extensive monitoring efforts, we still have scarce knowledge on long-term ecosystem responses to climate transitions for many taxonomic groups particularly zooplankton, fish, non-fossilizing algae, and benthic organisms such as tunicates, starfish or macrophytes."

Results:
Reviewer Comment: This section needs some focussing. In the current form there are several occasions where parts that should be in the Introduction or Discussion are intermingled with the description of the data and figures/tables, which makes this section difficult to follow. For example, there are discussion points in L179, Our response: We moved parts of the results and material and methods to the introductions. However, we would like to keep the discussion close to the results. To avoid confusion we merged the result and discussion section and added subchapters. For overall conclusions we now use the title "Implications for sea ice decline in the northern Bering Sea". Moved and revised text from results/material and methods to introduction L87-103: "Correlation networks have been used in the interpretation of multidimensional data by means of co-occurrence networks and contributed to our understanding of ecosystems by identifying, for example, habitat preferences in aquatic bacterial communities 24 or geographic co-occurrence patterns 25 . The sparsity of metagenomic datasets resulting from either true absences or insufficient sequencing depth can lead to spurious correlations 26 . Gaussian copula graphical models (GCGMs) are a novel statistical framework developed to separate environmental effects from intrinsic interactions among taxa 27 . They are suited to address the compositional structure of sequencing data, non-linear correlations, and to remove spurious correlations via mediator taxa (if a correlation between two taxa arises only because both are correlated to a third taxon) or via similar responses to environmental effects (covariance). Correlation networks have not yet been applied to sedaDNA data before. As sedaDNA samples contain averaged information on decadal to centennial timescale (here on average 15.5 years during the late glacial and 28.7 years during the Holocene in a centimeter thick sample), correlation in the network should not be interpreted as ecological interactions between linked taxa or actual co-occurrence on a short-time scale. We here assume that positively correlated families show similar responses to environmental changes (i.e., sea ice coverage and SSTs)." L 69-75: "Marine sediments are a natural archive of climate history from which sea ice can be reconstructed via proxy records, such as from biomarkers 10 or microfossil remains 11 . A regional palaeoceanographic framework for past sea ice coverage in the Bering Sea is based on reconstructions using the highly branched isoprenoid alkane biomarker IP25 12 which is produced by diatoms that are bound to a life in sea ice 10 and diatom microfossils 11 . High values of IP25 suggest seasonal sea-ice and ice-edge processes, while its absence could indicate either permanent sea-ice coverage or dominantly ice-free conditions. Alkenone-based sea surface temperatures (SSTUk'37), which are produced by haptophytes, and can add complementary information about SSTs of the late summer/early fall season 13 ." L 108-113: "The paleoenvironmental backbone is based on previous multi-proxy reconstructions of the coring site derived from this core, including diatom microfossils and the biomarker IP25, which show that the coring site was covered by seasonal sea-ice during the Last Glacial Maximum (LGM), Heinrich Stadial 1, and the Younger Dryas, while the phases of the Bølling-Allerød and the Holocene were predominantly sea-ice free 12,16 . " Reviewer Comment: Fig. 2 and Fig. 4. Pelagic and Benthic profiles; these are currently part of the sections "Taxonomic assignments to potentially pelagic organisms are dominated by primary producers and fish" and "Benthic sedaDNA sources and network dynamics". While SST and IP25 measurements are shown as part of the figures, a more detailed description of the changes that occurred with varying SST/IP25 or on a temporal scale, i.e.. in relation to the deglacial/sea-ice loss through time, could be included. Our response: The temporal changes of the pelagic community is already described in the text. We now included the curve and description of the temporal development of the ratio between pelagic and benthic read counts. And added some text on the benthic community changes. Revised text L 297-304: "The ratio between counts assigned to pelagic and benthic families (pelagic:benthic ratio) is highest in samples dated to the late glacial and Younger Dryas compared to the warmer Bølling/Allerød and the Holocene (Fig. 4). The pelagic:benthic ratio shows a weak, negative correlation with SSTs (ρ = -0.39, p-value = 0.052), indicating that more reads are assigned to benthic families (and more reads to pelagic families) in phases with warmer SSTs. In our dataset, Zosteraceae have higher read counts during the Holocene and are significantly, positively correlated with summer/early fall SSTs (ρ = 0.73, adjusted p-value = 0.026; Fig. 4b). Laminariaceae (kelp) have higher read counts during the late glacial.

Reviewer Comment: L122: Environmental preference of pelagic taxa inferred by network analysis; this is a lengthy section, consider introducing subheadings to describe the sea-ice and ice-covered networks.
Our response: We followed this suggestion and split the network analysis and the environmental preference of pelagic taxa into two sections, thereby also accounting for the new network analysis approach. Revised section titles: We split the sections into first "Co-occurrence network-derived environment associations" (L 183ff) and a second "Environmental preference of pelagic taxa inferred by network analysis" (L 209ff).
Reviewer Comment: L130: "decadal to centennial": You might have done this already, but have a good overview about sedimentation rates and that ~780 years lie between each depth sampled, based on this could you narrow down the estimate how many years would be covered within your sedaDNA sub-sample? Our response: We now give an estimation of accumulated years per 1 cm sub-sample in the introduction section and added the following text. Revised text L 97-101: "As sedaDNA samples contain averaged information on decadal to centennial time-scale depending on sedimentation rates (here according to our age model on average 15.5 years during the late glacial and 28.7 years during the Holocene), correlation in the network should not be interpreted as ecological interactions between linked taxa or actual co-occurrence on a short-time scale." Reviewer Comment: L150: Are you able to investigate these reads to see if they might be one of those two suspected species? Our response: We analyzed our data and the database to show whether our approach could distinguish between the two species. Both Gadus chalcogrammus and Gadus macrocephalus are represented in the database by 9,943 and 5,727 unique minimizers (short substrings used for binning of k-mers) respectively. Gadus morhua is however much more represented in the database with 131,648,356 unique minimizers, hence classifications on species level could be prone to a reference bias. On species level 6 counts were assigned to Gadus chalcogrammus, no counts were assigned to Gadus macrocephalus, but 159 were assigned to Boreogadus saida (Polar cod).

Revised text L 219-224:
"Reads assigned to this family can most likely be attributed to Pacific cod (Gadus macrocephalus), walleye pollock (Gadus chalcogrammus) or Polar cod (Boreogadus saida). While pollock and Pacific cod are generalist predators in the Bering Sea, they are strongly influenced by bottom-up effects through prey composition and availability 40,46 compared to Polar cod which is a keystone species of the high-Arctic with a strong linkage to sea ice 47 ." Revised text in Supplementary Information: "Among Gadidae, our reads were assigned to 7 of the 12 genera represented in the database. All of them except for Micromesistius (2 counts assigned) are present in the Bering Sea or the Pacific Arctic. The majority of reads were assigned to the genus Gadus (35,832 counts), which is driving the trends on family level. Notable, but still few assignments were found for the genus Boreogadus (Polar cod; 163 counts). Today, Polar cod is a keystone species of the high-Arctic with a strong linkage to sea ice 4 , which supports their presence only during the late glacial samples." Included Reference: Kohlbach, D. et al. Strong linkage of polar cod (Boreogadus saida) to sea ice algaeproduced carbon: Evidence from stomach content, fatty acid and stable isotope analyses. Progress in Oceanography 152, 62-74 (2017).

Reviewer Comment: L197: I note that the network analysis for the benthic families is provided with the Supplementary Material, but if not limited in figure numbers it might be worth adding this figure to the main text.
Our response: For pelagic network we present spearmen and ecoCoupula (suggested by Reviewer3), however, for the benthic taxa much less reads are assigned. As ecoCoupula requires a higher data number to set up reliable models, we refrained from implementing network analyses for benthic taxa at all.

Discussion:
Reviewer Comment: Much of the discussion currently focuses on speculation about potential changes in carbon export resulting from changes in differently sized phytoplankton communities. While this is an important point that is worthy of discussion in this manuscript, I am missing some discussion of the findings of the study about changes in the pelagic and benthic communities with time, correlations with SST, IP25, and in relation to the discussion points that are currently part of the results (see above). Our response: In accordance with the formatting guide we now use the Section heading "Results and discussion" with additional sub-heading addressing to present and discuss major results of our study. In the extra "discussion" section, which we entitled: "Implications for sea ice decline in the northern Bering Sea" we now discuss how pelagic and benthic communities might change under future warming and sea ice decline. Revised text: See section "Implications for sea ice decline in the northern Bering Sea" (L 328ff).

Methods:
Reviewer Comment: L334: Add year of extraction.

Our response: Information was added
Revised text L 395-398: "In 2018 we collected and extracted samples from the archive half of a 9.05 m piston corer (SO201-2-12KL, Fig. 1) which was retrieved during RV Sonne cruise SO-201 "KALMAR" in 2009 43 and since then stored at 4°C. We applied the chronostratigraphy established by Max et al. 43 .
Reviewer Comment:L341: 2-4mL -the mL using makes me wonder if this was a sediment slush, was it diluted? If not I suggest to use cm3 or g as unit.
Our response: We apologize for the confusion. We were subsampling with syringes of which we have cut off the front and extracted all of it. This is why a volume of 2-4mL was taken and extracted. Revised text L 409-410: "The DNeasy PowerMax Soil Kit (Qiagen, Germany) was used to extract total DNA from an average of 7.5 g sediment." Reviewer Comment: L350: change "with twice" to "twice with" Our response: Changed. Revised text L 417-420: "We measured the DNA concentration on a Qubit 4.0 fluorometer (Invitrogen, Carlsbad, CA, USA) using the Qubit dsDNA BR Assay Kit (Invitrogen, USA) and depending on the initial DNA concentrations we concentrated a volume of 300 µL or 600 µL using the GeneJET PCR Purification Kit (Thermo Scientific, USA) and eluted twice with 25 µL elution buffer." Reviewer Comment: L362: Add after "two": "blanks" Our response: Included. Revised text L 430-433: "Therefore, five of the seven extraction blanks (50 µL per extraction blank were pooled and 5 µL of the pool used for the library preparation) were sequenced together with their corresponding samples in the first run, while the last two blanks were processed later in the second sequencing run." Reviewer Comment: L486/487: Delete, this is already in the previous sentence. Our response: We removed the duplicated sentence.

Reviewer #2 (Remarks to the Author):
Review of MS NCOMMS-22-05048 1. Key results: Your overview of the key messages of the study, in your own words, highlighting what you find significant or notable. Usually, this can be summarized in a short paragraph.

MY COMMENTS:
This study brings the field of ancient environmental DNA (eDNA) forwards by presenting change in functional diversity of a wide range of marine organisms (notably prokaryotic and eukaryotic phytoplankton, macrophytes, fish, some marine mammals, but also potentially less described invertebrates) during the last 20,000 using a novel ancient eDNA approach on marine sediments (shotgun metagenomics). Correlation network analyses and correlations to paleoclimatic variables (SST, IP25) are used to associate taxonomic and functional groups to climate-driven change in sea-ice conditions at the site. The main 'shifts' in ecosystem response to post-glacial sea-ice loss recorded by this novel approach lie in smaller-sized algal primary producers related to different C storage and export from the oceans and major fish community reorganization at higher trophic levels.

Validity:
Your evaluation of the validity and robustness of the data interpretation and conclusions. If you feel there are flaws that prohibit the manuscript's publication, please describe them in detail.

MY COMMENTS:
Reviewer Comment: 103-106: While the taphonomic-based explanations provided are plausible causes for the discrepancy in relative abundance, I would think this difference is also likely due to database bias, where a number of groups e.g., invertebrates are relatively poorly represented in NCBI. Perhaps the authors have a better idea of genomic coverage of their organisms of interest to assess reference bias, please provide if possible.
Our response: Database gaps are present for many taxonomic groups. In December 2020, the year we downloaded the database, genomes of 594 fish species were available, which is about 1.85% of estimated fish species on our planet (Randhawar & Pawa 2021). For the Western Bering Sea, we are lucky that the dominant fish (e.g. Salmonidae, Gadidae, Clupeidae) and Copepod (e.g. Calanidae) families are represented by at least one reference genome. Aside from Copepods, other invertebrates such as Echinoderms, Molluscs, and Cnidarians are represented in the database. For example, for nematodes, 154 genomes and over 4 million nucleotide sequences are available.
Revised text L 161-163: "Alternatively, this discrepancy could be explained by differential preservation or degradation due to different skeletal properties or large gaps in the sequence reference database regarding marine macrofauna for both barcoding genes and genomes 40 . " Added reference: 40. Hestetun, J.T., Bye-Ingebrigtsen, E., Nilsson, R.H. et al. Significant taxon sampling gaps in DNA databases limit the operational use of marine macrofauna metabarcoding. Mar. Biodivers. 50, 70 (2020). https://doi.org/10.1007/s12526-020-01093-5 Reviewer Comment: l. 147: I could not find the read counts associated to copepods (rather %), please add. Our response: We included read counts from all families in Supplementary data (file1) and added the following information in the main text. Revised text L 210ff: "Among algae, this pattern can be observed for Bacillariaceae (0.8% of pelagic counts), Stephanodiscaceae (0.6%), Thalassiosiraceae (1.1%), Triparmaceae (0.3%), and Phaeocystaceae (0.6%), which can be found in sea ice or as a part of cryopelagic communities 45 . They co-occur with the chlorophytes Mamiellaceae (0.5%) and Bathycoccaceae (1.7%), which contain cold-adapted lineages that prevail in the marginal ice zone or in landfast sea ice 46,47 . Closely linked to these primary producers are copepods (Calanidae 0.03%, Metridinidae 0.03%), [...]" Reviewer Comment: l. 174: Warmer conditions seem to favor early-stage fish that, in the case of salmon at least, would occur in freshwater rivers, possibly in quite distant localities, correct? Do we know where the fish populations at the site breed at present? The evidence here is for SST, strictly speaking, which globally would be correlated to land temperatures of the breeding grounds, but supporting evidence/references would help make this point clearer. Our response: Yes, this is correct, spawning grounds for Salmonidae are rivers in Kamchatka and Chukotka. We added text about Salmonidae spawning grounds and land temperature change in the discussion section.

Revised text L 249-254:
"Rivers in Kamchatka and near-by Chukotka are spawning grounds for Salmonidae 57 and are known as a diversity hotspot; for example, all species of Pacific salmon (Oncorhynchus) occur in the area. Summer temperature reconstructions from Chukotka show a similar trend to SST reconstructions from our core 43 with lowest values during the pre-15 ka and an about 5°C rise until the early-to-mid Holocene temperature maximum 58 ." Reviewer Comment: l. 179: Do increased freshwater inputs bring more nutrients overall? Our response: Yes, although the processes are a bit more complex. We included a sentence and a reference. Revised text L 259-261: "While freshwater discharge is a source of organic carbon and nutrients that can enhance productivity, the lower salinity of surface waters increases vertical water column stratification 58 ." Included reference: "58.
Heikkilä, M., Ribeiro, S., Weckström, K. & Pieńkowski, A. J. Predicting the future of coastal marine ecosystems in the rapidly changing Arctic: the potential of palaeoenvironmental records. Anthropocene 100319 (2021) doi:10.1016/j.ancene.2021.100319." Reviewer Comment: l. 186: Could N-fixation Cyanobacteria also be an advantage? Please elaborate. Our response: We included some information about the surface-area-to-volume ratio. Revised text L 263-269: "Under nutrient depleted conditions, cyanobacteria are assumed to have several advantages such as the small surface area-to-volume ratio of their cells and thus lower nutrient requirements 56 , the capability of nitrogen fixation (here limited to Nostocaceae), and their preference of inorganic nitrogen in the form of ammonium. Ammonium has an energetic advantage compared to nitrate, the preferred form by diatoms, which has to be transformed to ammonium within the cell at an energetic cost 64 ." Reviewer Comment: Discussion: The evidence presented for a shift in phytoplankton size is quite convincing, but I would like to see some comments on the change in other functional aspects in primary producers that may offer alternative or complementary explanation on the impact of this shift in composition on the ecosystem e.g., N-fixation by cyanobacteria and role in the biogeochemical cycling. Our response: See response above. We included some information about N-fixation by cyanobacteria. Revised text L 263-267: "Under nutrient-depleted conditions, cyanobacteria are assumed to have several advantages such as the small surface area-to-volume ratio of their cells and thus lower nutrient requirements 63 , the capability of nitrogen fixation (here limited to Nostocaceae), and their preference for inorganic nitrogen in the form of ammonium. Ammonium has an energetic advantage compared to nitrate, the preferred form by diatoms, which has to be transformed to ammonium within the cell at an energetic cost 64 ." Added reference: 64. Glibert, P. M. et al. Pluses and minuses of ammonium and nitrate uptake and assimilation by phytoplankton and implications for productivity and community composition, with emphasis on nitrogen-enriched conditions. Limnology and Oceanography 61, 165-197 (2016).
Reviewer Comment: l. 203-207: It is potentially interesting that such fewer reads were found from the benthic groups, but it would be best to discuss number of reads assigned from rarefied reads only, not both. Raw reads might differ in different samples based on library success, which could be confounded with e.g., sediment properties during different time periods (e.g., more abundant classified reads in Suppl. Fig. 1, 13-19 ka yrs BP). The difference between benthic and pelagic reads would be more effectively presented by plotting a Pelagic:Benthic ratio along Figs 2 and 4. This part is also somewhat speculative, as a lot of questions remain as of the taphonomy of ancient DNA in sediments.
Our response: This is an interesting point. We now present the ratio (based on resampled read counts) of pelagic:benthic families and added the ratio to Figures 2 and 4. We also tested this ratio for correlations with SSTs (Spearman's rho = -0.39, p-value = 0.052) and IP25 (Spearman's rho = 0.15, p-value = 0.48) and the added results to the main text.  Revised text L 283-289: "The overall lower number of reads attributed to benthic (263,193 originally, 45,975 after resampling) in comparison to pelagic (476,058 originally, 210,800 after resampling) organisms could suggest that upper-ocean productivity is higher than benthic productivity at the deep continental margin. However, the low assignment rate, particularly among benthic fauna may, to an unknown extent, also relate to the large number of undescribed species and the sequence database gap of deep-sea biota in general 68 . Some typical deep-sea biota were detected such as free-living nematodes (Oxystominidae, 0.1%) albeit only with a few reads (Supplementary data (file 2))." L 297-301: "The ratio between counts assigned to pelagic and benthic families (pelagic:benthic ratio) is highest in samples dated to the late glacial and Younger Dryas compared to the warmer Bølling/Allerød and the Holocene (Figs. 2,4). The pelagic:benthic ratio shows a weak,  Reviewer Comment: l. 475: Having control over sea-ice from the same sediment profile is a strong asset of this MS, which would be highlighted further by stating clearly early on in the MS and/or detailed in methods that these records are derived from the same core (if Fig. 2, add also intext). However, the lack of it for the first and l.475 'last two for SST and the last four for IP25 data' samples (here there actually seem to be an offset with the data presented in Figs. 2 and 4, where SST and IP25 records appear longer by 1 sample or so) limit the conclusions, particularly for the early phase. The SST record available in the Northeast Pacific appears to have incursions into warm conditions at the very base of the record, which could correspond to 'ice-free conditions', or perhaps not well represented by the averaging of the closest samples (same applies for the top-most sample). Please provide further justification for the extrapolation, or remove samples from statistical analyses with SST and IP25 for these intervals and adjust conclusions accordingly. Our response: We interpolated the variables SST and IP25 (see figure below) to address this issue with a new method. Our extrapolation at the core base is based on the assumption of stable sea-ice conditions between 20 and 18 ka. This assumption is consistent with several proxy records from several cores across the Bering Sea and the Sea of Okhotsk which all show that winter sea ice coverage was stable during the time between 18 to 20 kyr BP 11,12,14 . We extrapolated a stable value of IP25 toward the base and are confident that this is the best knowledge based extrapolation method and also superior to deleting the entire samples from data analyses. Figure 11. Interpolated and extrapolated values for the environmental variables IP25 (µg g -1 sediment) and SSTUk'37 (°C).

Supplementary
Revised text L 70-73: "A regional palaeoceanographic framework for past sea-ice cover in the Bering Sea is based on reconstructions using the highly branched isoprenoid alkane IP25 12 which is produced by diatoms that are bound to a life in sea ice 10 and diatom microfossils 11 ." L 109-113: "The paleoenvironmental backbone is based on previous multi-proxy reconstructions of the coring site derived from this core, including diatom microfossils and the biomarker IP25, which show that the coring site was covered by seasonal sea-ice during the Last Glacial Maximum (LGM), Heinrich Stadial 1, and the Younger Dryas, while the phases of the Bølling-Allerød and the Holocene were predominantly sea-ice free 12,16 ." MY COMMENTS: I am not familiar with metabarcoding studies of ancient eDNA from marine sediments, however to my knowledge this could be one of the first applications of shotgun metagenomics in such context, at once providing a paleoecological record of the largest breath of lifeforms over the longest period. The strong record for fish, in particular, is a major advance for the marine-based and sedaDNA field, which is also of potential great significance for evaluating long-term changes in fish stocks and fish ecological and evolutionary dynamics in the future, which remain difficult to quantify in oceans today.
Other papers using shotgun sequencing on marine sediments that I am aware of: Gaffney et al 2020 Geosciences, who presented plant sedaDNA from a tsunami sediment deposits, which did not aim to reconstruct paleoecological dynamics. Eukaryote composition from shotgun metagenomics were also compared to metabarcoding data from Santa Barbara Basin deep-sea sediments over the Holocene, mainly focusing on diatoms (Armbrecht et al 2021 ISME).
Reviewer Comment: Futhermore, the authors go beyond listing taxa and provide an elegant and straightforward statistical analysis of their molecular data against environmental variables by using correlation networks (please see comments below). Altogether, their novel approach allows them to measure the effects of (long-term) climate change on important ocean processes like primary production and food-web interactions at an unprecedented level of detail and taxonomic breadth. The relevance of this work as 'analogous' to ongoing sea-ice loss trends in subarctic coastal environments that is discussed by the authors in Discussion could be emphasized in the introduction already by describing how the 'seasonal sea-ice' to 'see-ice free' conditions over the period studied have changed and relate to documented ongoing and expected e.g., IPCC trends for relevant regions. Our response: We included the information into our introduction.

Revised text L44-46:
"Long-term trends in the Bering Sea show an ongoing decline of sea-ice duration which is projected to decline further due to later freeze-up (by 20 to 30 days) and earlier break-up (by 10-20 days) until the middle of this century 3 ." L 65-68: "Past ecosystem responses to sea-ice loss at the Pleistocene-Holocene boundary in the southern part of the Bering Sea could possibly reveal such long-term rearrangements and serve as an analog for future changes in the Pacific Arctic." 4. Data and methodology: Your assessment of the validity of the approach, the quality of the data, and the quality of presentation. We ask reviewers to assess all data, including those provided as supplementary information. If any aspect of the data is outside the scope of your expertise, please note this in your report or in the comments to the editor. We may, on a case-by-case basis, ask reviewers to check code provided by the authors (see this Nature editorial for more information). Reviewers have the right to view the data and code that underlie the work if it would help in the evaluation, even if these have not been provided with the submission (see this Nature editorial). If essential data are not available, please contact the editor to obtain them before submitting the report.

MY COMMENTS:
Reviewer Comment: Outliers: At least three (1.08, 5.6 and 9.84, cal ka yrs[10] [11] [HZ12] ) of the samples presented seem to be outliers in the dataset (own branches on dendrogram Figs 2 and 4, very distant from neighbours), and I suspect the low raw, classified pre-and post-read counts exclusion of taxa and resampling may explain this as an artifact, rather than a true signal. This could be showed by presenting e.g., a PCoA of the samples (with and without controls) with a biplot of the main taxa before and after exclusion of taxa in Suppl. I think removing them would change correlations and interpretations slightly (likely strengthen them), and help present the counts from the Figures 2 and 4 on clearer scales. Please provided the data used to plot Supplementary File 1 in table form, and please also provide read counts before and after exclusion of taxa (described on l. 409-424). Our response: The presentation of this detailed table required a re-processing of the entire data analyses, which was only possible against a newer version of the non-redundant reference database (downloaded April 2021 before it was May 2020) with the Kraken2 classifier 2.0.8-beta instead of 2.0.7-beta. The usage of a newer database and an upgraded version of the classifier Kraken2 changed our data slightly but did not affect any of our conclusions. Therefore, all analyses and plots have been updated and the samples which are mentioned as outliers, do not appear as outliers anymore (Supplementary Figs. 1,2). We now provide all processing steps as Supplementary data including raw read counts, read counts after deduplication, filtering and trimming, classification and resampling of selected families. All text, tables and figures were updated according to the results of the reprocessed data. The PCoA based on Bray Curtis dissimilarities shows that the overall compositions of the three samples in question cluster with other samples from the Holocene. Only after taxonomic filtering and resampling their compositions are markedly different. The PCoA shows that the blanks have a distinct composition which is highly dissimilar from the samples ( Supplementary Fig. 2). Reviewer Comment: All figure captions and tables titles: specify whether reads presented are rarefied or not. Our response: Information included. Revised caption of Fig 3:  Figure 3. Correlation network of pelagic families. Families (nodes) that are positively correlated (Spearman rank correlation coefficients > 0.4, adjusted p-value < 0.1) with each other are connected by links (edges) after resampling. The node size is log-scaled according to taxa abundance, while node colors represent (a) the functional group to which the organisms belong, and (b) whether the node showed a positive trend (Spearman rank correlation coefficients > 0.4, adjusted p-value < 0.2) with the seasonal sea-ice biomarker IP25 42 (blue) or SSTs 44 (red). Node labels are given if a family exceeds 53 read counts." Reviewer Comment: Fig. 2: Please consider using the same scale for rel. abundances. At present it is difficult to see groups most recorded and transitions. Also, as they are resampled and not actual read-counts, I suggest presenting into percentages for ease of comparisons, also to be more consistent with numbers reported in text.Figs 2 and 4: Boxes for the ice-free ecosystem are different, which one is correct (Fig. 4?)? Consider marking the Families (significantly) positively correlated to SST and IP25 in red and blue in Figs 2 and 4 for ease of interpretation. l. 142: Add climatic periods mentioned in test to Figs 2 and 4. Our response: We combined all comments provided by Reviewer 2 regarding Figures 2 and 4 here and addressed them all during the revision. Please find the revised figures below. We changed it to percentages. On a similar scale a few taxa dominate the plot and make it impossible to see temporal changes for the rest of the families in the plot. Therefore, we are keeping the different scales, but refer to it in the Figure caption. We adjusted the boxes.

Supplementary
Revised Fig. 2:   Figure 2. Pelagic families in the sedaDNA record of core SO201-2-12. Only families represented in at least 3 samples and by at least 1.5% proportion of read counts after resampling are shown. The dendrogram (based on constrained hierarchical clustering) and IP25 concentrations from the same core 40 and sea-surface temperature (SSTUk'37) reconstructions of the Northwest Pacific 41 (blue) and Northeast Pacific 42 (black) are added for comparison. Families which are significantly, positively correlated with IP25 or SSTUk'37 are highlighted in blue or red, respectively. The pelagic:benthic ratio (black line) shows the proportion of reads assigned to pelagic families in relation to reads assigned to benthic families. The ratio between reads assigned to phototrophic bacteria and reads assigned to phototrophic protists is given as blue-green line. Black triangles next to the IP25 record mark radiocarbon-dated calibrated ages 41 . LGM = Last Glacial Maximum, HS1 = Heinrich Stadial 1, B/A = Bølling/Allerød, YD = Younger Dryas.
Revised Fig. 4:   Figure 4. SedaDNA record of benthic families of core SO201-2-12. Only families represented in at least 3 samples and by at least 1.5% proportion of read counts after resampling are shown. The dendrogram (based on constrained hierarchical clustering) and IP25 concentrations from the same core 40 and sea-surface temperature (SSTUk'37) reconstructions of the Northwest Pacific 41 (blue) and Northeast Pacific 42 (black) are added for comparison. Zosteraceae, which are significantly, positively correlated with SSTUk'37, are highlighted in red. The pelagic:benthic ratio shows the proportion of reads assigned to pelagic families in relation to reads assigned to benthic families. The ratio between reads assigned to phototrophic bacteria and reads assigned to phototrophic protists is given as blue-green line. Black triangles next to the IP25 record mark radiocarbon-dated calibrated ages 41 . LGM = Last Glacial Maximum, HS1 = Heinrich Stadial 1, B/A = Bølling/Allerød, YD = Younger Dryas.
Reviewer Comment: l. 83: 72.5% of classified read pairs overlapped and were merged. Is the total read number on l.76 including overlapping read pairs (reads from 'the same DNA strand' counted twice?). Please clarify this elsewhere also, it's important that the merged reads are not counted twice. Our response: Merged reads were not counted twice. To avoid any confusion we now provide the number of read pairs instead of merged or paired reads and adjust the text accordingly.
Revised text L 132-133: "Of those, we were able to classify 13,119,146 read pairs (merged and paired read pairs together)." Reviewer Comment: l. 190, Fig. 3: Nice figure, well thought out and functional groups easy to distinguish. Please add legend for equivalent % for size of nodes (same applies to other network provided in suppl. materials). Add explanation to caption for labelling of nodes or not (some nodes don't have labels, likely because they have too low %). Some family names are difficult to read and associate to the nodes, consider placing them more off from one another with black lines to the nodes, include fewer names or replace with images of taxa as per Fig  5 where appropriate.

Our response:
We added a node size legend and included the threshold for labelling nodes into the caption. After re-analysis with the newer Kraken2 database (see response above), Figure 3 changed slightly. Families (nodes) that are positively correlated (Spearman rank correlation coefficients > 0.4, adjusted p-value < 0.1) with each other are connected by links (edges) after resampling. The node size is log-scaled according to taxa abundance, while node colors represent a the functional group to which the organisms belong, and b whether the node showed a positive trend (Spearman rank correlation coefficients > 0.4, adjusted p-value < 0.2) with the seasonal sea-ice biomarker IP25 40 (blue) or SSTs 42 (red). Node labels are given if the family exceeds 53 read counts.

Revised
Reviewer Comment: l. 322 The chronology was reported in a 2012 publication; please indicate whether the same, 2009 core was used, and if so, indicate the date or year and procedure for the splitting of the core that was used for the DNA work. Cores stored unopened vs opened for many years prior to DNA sub-sampling may be differently affected by secondary growth of e.g., fungi, yeast and Bacteria that would influence DNA results (as per stated by the authors). If different cores were used, please describe the splitting procedure and how those were correlated for the age-depth model. Our response: The archive half of the same core was used. We included the underlined information into the manuscript and added a dedicated paragraph "Opening of cores and subsampling procedures to prevent contamination" to the supplement. Revised text L 395-397: "In 2018 we collected and extracted samples from the archive half of a 9.05 m piston corer (SO201-2-12KL, Fig. 1) which was retrieved during RV Sonne cruise SO-201 "KALMAR" in 2009 43 and since then stored at 4°C." L 400-402: "The sampling was carried out at GEOMAR, Kiel, in a sedimentology laboratory devoid of any molecular biology work. Details about the subsampling procedure and the opening of the cores can be found in the Supplementary Information." New paragraph in Supplementary Information: "During the expedition on RV Sonne cruise SO-201 "KALMAR" in 2009, the core liner was cut into 1 m sections on board of the ship, which is common practice. After measuring the magnetic susceptibility (not part of the manuscript) on the round-sections, each 1m-section was split into working and archive halves. The working halves were sampled onboard and completely used up, providing sample material, commonly in 1 cm-slices, for various working groups. Color reflectance measurements (not part of the manuscript) were made on the archive half sections, which were covered with transparent polystyrene foil before measurement. The archive halves were packed into plastic D-tubes and stored at ~4°C, within a few hours of recovery." Reviewer Comment: l. 327 As the lab facilities used for sub-sampling are not a dedicated clean facilities, please include the precautions that were undertaken to limit contaminants (air, surfaces, tools) from the wide range of organisms studied at GEOMAR, like fish and phytoplankton. Our response: We included the following description of sub-sampling procedures and localities in a dedicated chapter in the Supplementary Information. Revised text in introduction L 400-402: "The sampling was carried out at GEOMAR, Kiel, in a sedimentology laboratory devoid of any molecular biology work. Details about the subsampling procedure and the opening of the cores can be found in the supplement." Added text in supplement: "The sampling was carried out at GEOMAR, Kiel, in a sedimentology laboratory devoid of any molecular biology work. The lab is not used for experiments on modern phytoplankton, fish, or other living organisms. The surface of the table and the core liner were cleaned DNAExitus Plus and rinsed of with MiliQ water. The plastic foil covering the sediment was removed sample-by-sample to limit exposure of the sediments to the surrounding air. Clean, single-use knives (soaked in DNAaway for 10 minutes, rinsed with 70% Ethanol and irradiated for 20 min on each side in a UV-crosslinker) were used to remove ~2mm of the exposed sediment. Then a sterile scalpel blade was used to remove a second layer of sediment (~2 mm). A syringe, of which the front was cut off directly before, was inserted aiming for a sample volume of up to 4 mL. The sample was transferred into a sterile 8 mL tube and stored at -20°C until DNA extraction. During the whole procedure, full body suits, exchangeable arm sleeves, face masks, and two pairs of gloves on top of each other were worn." Reviewer Comment: l. 345: Please clarify in text whether the same extracts as ref 15 were used (I assume this is the case, as the same extract controls are analysed), or whether new extracts were made from the samples. If the same extracts were used, please clarify that the protocol is exactly as per ref 15 from the first sentence of this paragraph. Our response: We used the stocks from the same extracts that were kept continuously frozen after the extractions while working aliquots were used for DNA metabarcoding. Revised text L 410-: "For a previous DNA metabarcoding analysis 20 we extracted DNA for 63 samples in 7 batches where each batch contained up to 9 samples and an extraction blank (in total 7 negative controls). From those, we chose the stock solutions of 25 DNA extracts and the corresponding extraction blanks, which had been kept continuously frozen since the extraction for the metagenomic shotgun sequencing." Reviewer Comment: l. 376: Were the controls also provided with 10-14 cycles? This may explain why such few reads could be sequenced. Our response: The extraction and library blanks were treated like the samples and were amplified with 10-14 cycles, we added this information to the main text. The low number of read counts is a result of the low DNA content in our blanks. We additionally added the information about the data processing for samples as well as for negative controls and prepared it in table form as Supplementary data (file 1). Related to another reviewer comment below, we also included a dedicated section ("3. Negative controls") about read counts of the blanks into the revised Supplementary Information.
Revised text L 445-448: "For samples the subsequent double-indexing was performed by PCR in 10-14 cycles depending on library concentration [...]. Blanks were amplified with the same number of cycles, although initial DNA concentration was 0 ng/µL. "

New section in supplement:
"Throughout the library preparation process the negative controls were treated in the same way as the samples, also in terms of PCR cycle numbers for the indexing step. We retrieved between 395,384 and 2,494,694 read pairs for the different negative controls of which between 0.01 and 30% were duplicates. After deduplication, trimming and filtering for low complexity, low quality, residual adapters and length only between 0.3 and 2.1% of the original read pairs were left. This suggests, our blanks contain very low levels of DNA and that increasing cycle numbers would predominantly amplify our duplication rate and increase the number of adapters in the sequencing data." Reviewer Comment: l.390. A 5% replication rate seems impressively low; is this in a similar range from other sedaDNA studies? Could the authors provide the distribution from Suppl. Fig. 1

for all samples in table form? Apologies if I have missed this.
Our response: We now provide deduplication results from each sample and negative controls in a table form (see Supplementary File 1 "data_processing.xlsx" in the Supplementary Material). To make sure of the duplication rate in our data we re-analyzed the raw data with a different tool (clumpify, BBMap package, https://www.biostars.org/p/225338/) than originally used in the manuscript (FastUniq v. 1.1). Both methods remove only exact duplicates from the data and calculate a very similar average duplication rate. FastUniq v. 1.1 calculated a mean duplication rate (including blanks and samples) of about 5.3% and clumpify calculated on average 10.6%. Highest deviations between both approaches are detected in the duplication rates in the blanks (see Figure below this text, only for response letter), whereas within samples the rates are very similar (FastUniq=4.2%, clumpify=7.2%). The high similarity of duplicate rates in samples found in the two different approaches support the correctness of the duplication rate in our data. This finding is also supported by the publication by Gaia et al. 2019 who compared different deduplication tools and showed a high comparability between the tools. https://www.nature.com/articles/s41598-019-48242-w/tables/1 Unfortunately, duplication rates are usually not distinctly reported from sedDNA studies. We found one publication by Carpenter et al. 2013 on ancient human teeth and hair samples (https://www.sciencedirect.com/science/article/pii/S000292971300459X) which reported of a similar duplication rate ranging between 2-20% identified by a deduplication tool in the SAMTools package (https://academic.oup.com/bioinformatics/article/25/16/2078/204688?login=true). Revised text L 459-466: "We used FastQC v. 0.11.9 83 to check the quality of the raw data followed by duplicate removal with FastUniq v. 1.1 84 which reduced the number of read pairs from 918,186,452 to 876,349,245 (Supplementary data (file 1)). We additionally applied clumpify (BBMap package) to perform deduplication. Both approaches resulted in comparable duplication rates. The average of sample and blank duplication rate for FastUniq is 5.3% and 10.6% for clumpify. The average duplication rate within samples is FastUniq=4.2% and clumpify=7.2 %, when considering blanks only the approaches identified a duplication rate of FastUniq=9.5% and clumpify=22.5 %, which indicates a better performance of clumpify to remove duplicates in the blanks. Afterwards we used Fastp [...]".

New Supplementary data (file 1):
This data file will be uploaded as Supplementary data. Reviewer Comment: l. 409-424: I think the approach for inclusion/exclusion of taxa is reasonable, however it would be useful to include the read counts that were removed by this 'exclusion step' from the initial read count at the classification at Family level step. Pls see additional comment on damage patterns. Our response: We only excluded taxa on family level and did not remove genera or species read counts from a family. The number of read counts before and after the exclusion step on family level (for merged and unmerged read pairs) are presented as Supplementary data file 1.
Reviewer Comment: l. 425 Though I acknowledge that similar threshold approaches have been used on similar datasets and at least partly agree with the reasoning behind it, keeping 'only families that occurred in at least 3 samples with at least 10 counts' prior to re-sampling (or other form of normalization) increases the risk of removing true positives, rare taxa that may by chance (or because of sediment matrix effects on library building success) occur only in shallower sampled intervals of the core. Could the authors comment on this effect in relation to their data? Our response: The manuscript builds upon the correlation analysis for which we require a stable signal of trends. Many taxa with single occurrences lead to the problem of zeroinflation to which the correlation analyses are sensitive. Opposed to the analysis of the presence/absence of single taxa, our goal was to decrease the false positives rate for which we had to consider the risk of missing true positives. Thus, for our sample size we determined a stable signal if families occured in at least 3 samples and with at least 10 read counts prior to resampling. Revised text L 500-506: "Finally, we kept only families that occurred in at least 3 samples with at least 10 counts, and resampled the dataset 500 times to a pelagic sample effort of 6,593 counts and a benthic sample effort of 1,839 counts per sample to circumvent bias arising from different sequencing depths across the samples 1 to capture the main compositional trend and allow for stable correlation signals. This measure was also effective in reducing zero-inflation of our dataset for the correlation network analysis and the few affected taxa have no significant role within the network (Supplementary Fig. 12)." Reviewer Comment: Negative controls: It is good practice that the authors included a number of negative controls from the extraction step in the sequencing, however the number of reads from them is quite low (total of ~500 classified reads from all controls combined). It is actually common for controls to yield relatively high levels of reads assigned compared to samples, because human-related taxa are overall more commonly sequenced and thus more represented in reference databases (erroneously or not), and it would be also important to know many raw reads were sequenced for controls (maybe I have missed this information). Please provide the read counts after the different bioinformatic steps for the controls as per the samples. Our response: We added the information for negative controls and prepared it in table form as Supplementary data (file 1). We also included a dedicated section ("3. Negative controls") about read counts of the blanks into the revised Supplementary Information (see revised text below).
Revised text in the Supplementary Information: "Throughout the library preparation process the negative controls were treated in the same way as the samples, also in terms of PCR cycle numbers for the indexing step. We retrieved between 395,384 and 2,494,694 read pairs for the different negative controls of which between 0.01 and 30% were duplicates. After deduplication, trimming and filtering for low complexity, low quality, residual adapters and length only between 0.3 and 2.1% of the original read pairs were left. This suggests, our blanks contain very low levels of DNA and that increasing cycle numbers would predominantly amplify our duplication rate and increase the number of adapters in the sequencing data." Reviewer Comment: Ideally, one would provide similar sequencing effort to controls as for samples, however such a standard has not yet been established in the field and 'best practice' approaches from previous works have varied, none reaching this ideal (to my knowledge). Furthermore, it can be challenging to obtain sufficient amounts of reads from controls for sequencing, that would indeed contain very little DNA (increasing the number of indexing PCR amplification cycles usually helps, but also result in high levels of duplication). That being said, I do not have a reason to dispute the authenticity of the results provided for the main taxa detected supporting the MS conclusions other than the relatively minor verifications I have pointed out elsewhere, particularly if the damage patterns from at least some of the representants of major groups of organisms can be established.
Our response: Throughout the library preparation process the negative controls were treated in the same way as the samples, also in terms of PCR cycle numbers for the indexing step. After deduplication, quality filtering and trimming of DNA reads (see Supplementary data, Data_processing_steps.xlsx) the blanks contained only between 0.3 and 2.1% of the original read pairs. Comparing this to the samples, we recovered about 56 to 86% of read pairs after the same bioinformatic steps. These results support that the blank controls contain very low levels of DNA and deeper sequencing would most probably result in the sequencing of more adapters and PCR duplicates. With regard to damage pattern analysis, we processed more taxa with the HOPS pipeline and included a dedicated section and additional figures on damage pattern analysis in the Supplement Information, where we also moved part of the lengthy description of the analysis.
Revised text in the Supplementary Information: "Throughout the library preparation process the negative controls were treated in the same way as the samples, also in terms of PCR cycle numbers for the indexing step. We retrieved between 395,384 and 2,494,694 read pairs for the different negative controls of which between 0.01 and 30% were duplicates. After deduplication, trimming and filtering for low complexity, low quality, residual adapters and length only between 0.3 and 2.1% of the original read pairs were left. This suggests, our blanks contain very low levels of DNA and that increasing cycle numbers would predominantly amplify our duplication rate and increase the number of adapters in the sequencing data. Of the 2,183 read pairs (merged and unmerged) 1,519 read pairs were retrieved on family level after filtering (Supplementary Table 2). The blanks are dominated, as expected, by bacterial-and human-derived sequences ( Supplementary Fig. 4). The majority of taxa found in the blanks is however only represented by a single sequence count in total ( Supplementary  Fig. 5). The composition of the blanks is highly similar as shown in the PCoA plot ( Supplementary Fig. 2) and markedly different from the samples." Supplementary Figure 4. The composition of all blanks combined shows that reads assigned to Hominidae and bacterial families are dominant in blanks. All taxa with less than five total read counts in blanks were grouped together.
Reviewer Comment: l. 444: Insufficient number of reads assigned were found for the damage pattern analyses, however pelagic and benthic merged and unmerged counts sum up to potentially higher reads counts than merged and unmerged read counts for some samples. Please clarify. Our response: HOPS can be run either with merged reads or unmerged reads, but not with merged and unmerged reads together. We included the following underlined sentence to improve clarity. Revised text L 521ff: "HOPS can either run for merged or unmerged data, but it cannot combine both. To reach a sufficient number of read counts for running HOPS, we combined all merged reads (unmerged reads were not considered due to the insufficient number of assigned reads) from all sediment core samples into one fastq-file[...]" 5. Analytical approach: Your assessment of the strength of the analytical approach, including the validity and comprehensiveness of any statistical tests. If any aspect of the analytical approach is outside the scope of your expertise, please note this in your report or in the comments to the editor.

MY COMMENTS:
Reviewer Comment: I'm not very familiar with correlation networks, but I find the use here to be useful in highlighting differences in community assemblages and function under different climatic regimes. I would like for the authors to address or comment on a couple of questions that would relate to the structure of ancient metagenomic datasets could be addressed in the MS: What is the sensitivity of correlation networks to detect non-linear relationships between taxa, especially over centennial to millennial timescales? Do biogeographical distribution patterns matter on this scale? How is this statistical approach handling the zero-inflated dataset? Trends in the dataset (vs environmental variables) could perhaps be further explored with GAMs (e.g., Simpson, G. L. Modelling palaeoecological time series using generalised additive models. Frontiers in Ecology and Evolution 149 (2018). Our response: Below, we have addressed the three posed questions separately. In summary, Spearman method is a non-linear rank-based correlation, which detects similar (with Spearman non-linear) shifts of abundances over time. Therefore, it is better suited to address correlations in our dataset than, for example the Pearson correlation coefficient. Our additional analyses have shown that (1) zero inflation is not strongly pronounced in our dataset, (2) the affected taxa have no significant role within the network and (3) that zeroinflation doesn't have an effect on the creation of the network (Supplementary Fig. 12).

Reviewer Comment: What is the sensitivity of correlation networks to detect non-linear relationships between taxa, especially over centennial to millennial timescales?
New chapter in Supplementary Information: "Spearman method is a non-linear rankbased correlation, which detects monotonic relationships. This approach to infer a network focuses on the generation of a co-occurrence network, where links between taxa don't necessarily mean a direct relationship or even interaction, but rather similar (with Spearman non-linear) shifts of abundances over time. On this time scale each sample represents a local snapshot of a distinct state defined by major environmental drivers like SST or sea ice. Identifying actual relationships between the taxa would require a higher resolution of the time series. We made further efforts to generate networks based on the ecoCopula method which considers environmental covariates as well as mediator taxa. These analyses have shown that the co-occurrence network based on Spearman correlation have a Jaccard Similarity of 0.34 (positive correlations) with networks inferred with ecoCopula. This result shows that the statements of our analysis based on the co-occurrence network remain consistent, considering that the ecoCopula network has fewer edges as well as rewiring without a large impact on the network."

Do biogeographical distribution patterns matter on this scale?
Revised text L 203-207: "Despite removing the environmental effects of IP25 and SSTs for network generation, the green module is mainly composed of families that are positively correlated with IP25 in the Spearman network suggesting a strong co-occurrence pattern of these families based on biogeographic distribution patterns or other environmental factors not included here, such as nutrient availability, salinity, or light conditions." How is this statistical approach handling the zero-inflated dataset? Revised text L 500-504: "Finally, we kept only families that occurred in at least 3 samples with at least 10 counts, and resampled the dataset 500 times to a pelagic sample effort of 6,593 counts and a benthic sample effort of 1,839 counts per sample to circumvent bias arising from different sequencing depths across the samples 1 to capture the main compositional trend and allow for stable correlation signals. This measure was also effective in reducing zero-inflation of our dataset for the correlation network analysis and the few affected taxa have no significant role within the network ( Supplementary Fig. 12)."

Revised text in Supplementary Information:
To evaluate the influence of zero inflation on the network, we assessed the distribution of the abundance matrix ( Supplementary Fig. 12b), analyzed which role zero-inflated taxa have within the resulting networks ( Supplementary Fig. 12a). These analyses have shown that (1) zero inflation is not strongly pronounced in our dataset, (2) the affected taxa have no significant role within the network and (3) that zero-inflation doesn't have an effect on the creation of the network. Revised text L 297-301: "The ratio between counts assigned to pelagic and benthic families (pelagic:benthic ratio) is highest in samples dated to the late glacial and Younger Dryas compared to the warmer Bølling/Allerød and the Holocene (Fig. 4). The pelagic:benthic ratio shows a weak, negative trend with SSTs (ρ = -0.39, p-value = 0.052), indicating that fewer reads are assigned to benthic families in phases of warmer SSTs."

Reviewer Comment:
To what extent do the correlations between families may be affected by reference bias within closely related taxa, where for instance, only a few members of a family with differential degree of coverage may be available for mapping in NCBI? How could this affect interpretations (e.g., l. 155)?
Our response: Generally, if closely related species of the same family are represented by a reference genome, kraken2 handles the taxonomic assignments well, based on unique minimizers and a LCA approach (Wood et al. 2019). If a family is not represented in the database, a sequence cannot be assigned to this family. If a sequence can equally likely be assigned to two different families, the kraken2 algorithm will place the taxonomic assignment on a higher level. The completeness/incompleteness of the database however does not influence temporal trends. The most expected fish families based on their role in the modern western Bering Sea food web are the Salmonidae, Clupeidae, Gadidae. Our database analysis shows that these are well represented through reference genomes of several genera (Supplementary Table 2, please find below the summary of the table). We dedicated a new section to the analysis of the database with special regard to fish into the Supplement. In summary, our data suggest that if a family is represented by a genome, kraken2 can manage the taxonomic assignment on this level. Therefore, we are confident that assignments to these families are valid.

Novel chapter in Supplementary Information: Could correlations be influenced by a reference database bias of closely related taxa?
"To show whether correlations could be influenced by a reference database bias, we performed a test based on the group of fish, which have a sufficiently large number of read counts. The most expected fish families based on their role in the modern western Bering Sea food web are the Salmonidae, Clupeidae, and Gadidae 3 . Our database analysis shows that these are well represented through reference genomes of several genera (Supplementary Table 2). The analysis shows that for Clupeidae and Gadidae, only genera from the study area determine the signal on family level ( Supplementary Fig. 7b,c). Among Clupdeidae, 48 genera are represented in the database and 99.3% (2,877counts) of the reads are assigned to the genus Clupea, while the rest (6 counts) are assigned to 5 other genera ( Supplementary Fig. 7b). Among Gadidae, our reads were assigned to 8 of the 12 genera represented in the database. All of them except for Micromesistius (2 counts assigned) are present in the Bering Sea or the Pacific Arctic. The majority of reads were assigned to the genus Gadus (22,411counts), which is driving the trends on family level. Notable, but few assignments were found for the genus Boreogadus (Polar cod; 107 counts). Today, Polar cod is a keystone species of the high-Arctic with a strong linkage to sea ice 4 , which supports their presence only during the late glacial in our samples.
The Bering Sea, and the Kamchatka area in particular, is a diversity hotspot for Salmonidae and provides spawning grounds for the genera Oncorhynchus, Coregonus and Salvelinus, which are all represented in our data on genus level. Of the overall 11 genera represented in the database, counts were mostly assigned to Oncorhynchus (2,648 counts), Salvelinus (613 counts), Coregonus (17,849 counts), and Salmo (18,090 counts), while only 6 counts were assigned to Thymallus and 2 to Hucho, which do not occur in the Bering Sea. The high assignment of reads to the genus Salmo could be a result of its high representation in the database or potential gaps in assemblies of other Salmonidae. This suggests that analyzing our data on family level is more appropriate than on lower taxonomic levels.
In summary, our data suggest that if a family is represented by a genome, kraken2 can manage the taxonomic assignment on this level. Therefore, we are confident that assignments to these families are valid." Supplementary Table 2 (summary). Read counts for planktic, pelagic, and benthic taxonomic groups before resampling. Reviewer Comment: Could the author also comment on the potential impact of downcore autocorrelation between samples for correlation networks and overall correlations with [UH15] environmental variables to strengthen the conclusions? Our response: We assessed temporal autocorrelation for each taxon and can conclude that overall, our data are not prone to temporal autocorrelation. We have provided an overview of the autocorrelation plots for the most important families in the supplement ( Supplementary  Figs 8 and 9). And added the following sentence in the first paragraph of the statistical analysis section. Correlations with IP25 and SSTs (including adjusted p-values) are given in Supplementary Table 4. Revised text L 530-532: "We tested each of the pelagic and benthic taxa for temporal autocorrelation using acf from the stats package and conclude that our trends and correlation networks are not influenced by temporal autocorrelation (Supplementary Figs. 8 and 9)."

Supplementary Figures 8 and 9:
Supplementary Figure 8. Temporal autocorrelation plots of some of the most abundant pelagic taxa among (a-d) fish, (e-f) cyanobacteria, (g-h) diatoms, and (i) chlorophytes suggest that downcore temporal autocorrelation between samples is not affecting the correlation network. The lag 0 autocorrelation is fixed at 1. Figure 9. Temporal autocorrelation plots of some of the most abundant benthic taxa suggest that downcore temporal autocorrelation between samples is negligible. The lag 0 autocorrelation is fixed at 1.

Reviewer Comment:
All relative abundance datasets suffer from closed sum effects. To further support one of the main conclusions of the MS about the change between smaller and larger celled primary producers from the pelagic environment, the authors could test the strength of the correlation between the ratio of read abundances of small:large phytoplankton and their environmental variables. Our response: A classification on cell size only is problematic, as we know that pico-sized chlorophytes, like the dominant family Bathycoccaceae, which are in our data also positively correlated with seasonal sea ice data, prefer colder conditions. However, as we are sure about the small cell size of bacteria, we calculated the ratio between reads assigned to phototrophic bacteria, which are classified as pico-sized bacterioplankton, and reads assigned to all remaining phototrophic protists. The ratio (photo.bacteria:photo.protist) is positively correlated with SSTs (Spearman's rho =0.35, p-value 0.088, see Figure for review only below) with a weak statistical significance which is potentially caused by the few number of samples. This positive relationship supports that pico-sized bacteria in general dominate over phototrophic protists under warming temperature which gives further support for the main conclusions drawn in our manuscript.
Revised text L 341-346: "Phytoplankton groups with large cell sizes such as diatoms are assumed to contribute more to carbon export than or other pico-sized plankton such as phototrophic bacteria 78 , which are presumed to be mostly recycled by the microbial loop in the water column 79 . A shift from larger phototrophic protists to smaller bacterioplankton in the course of a sea-ice loss could therefore alter the amount and quality of food sustaining benthic organisms 80 and the efficiency of the biological carbon pump 63 .  Only families represented in at least 3 samples and by at least 1.5% proportion of read counts after resampling are shown. The dendrogram (based on constrained hierarchical clustering) and IP25 concentrations from the same core 40 and sea-surface temperature (SSTUk'37) reconstructions of the Northwest Pacific 41 (blue) and Northeast Pacific 42 (black) are added for comparison. Families which are significantly, positively correlated with IP25 or SSTUk'37 are highlighted in blue or red, respectively. The pelagic:benthic ratio (black line) shows the proportion of reads assigned to pelagic families in relation to reads assigned to benthic families. The ratio between reads assigned to phototrophic bacteria and reads assigned to phototrophic protists is given as blue-green line. Black triangles next to the IP25 record mark radiocarbon-dated calibrated ages 41   IP25 concentrations from the same core 40 and sea-surface temperature (SSTUk'37) reconstructions of the Northwest Pacific 41 (blue) and Northeast Pacific 42 (black) are added for comparison. Zosteraceae, which are significantly, positively correlated with SSTUk'37, are highlighted in red. The pelagic:benthic ratio shows the proportion of reads assigned to pelagic families in relation to reads assigned to benthic families. The ratio between reads assigned to phototrophic bacteria and reads assigned to phototrophic protists is given as blue-green line. Black triangles next to the IP25 record mark radiocarbon-dated calibrated ages 41 . LGM = Last Glacial Maximum, HS1 = Heinrich Stadial 1, B/A = Bølling/Allerød, YD = Younger Dryas.
6. Suggested improvements: Your suggestions for additional experiments or data that could help strengthen the work and make it suitable for publication in the journal. Suggestions should be limited to the present scope of the manuscript; that is, they should only include what can be reasonably addressed in a revision and exclude what would significantly change the scope of the work. The editor will assess all the suggestions received and provide additional guidance to the authors.
MY COMMENTS: I have already provided suggestions in previous sections that would, in my opinion, strengthen the manuscript, and added a few more below.
Reviewer Comment: A major point: damage patterns were assessed for diatoms, but overall conclusions would be stronger if the other major groups found were also assessed like fish (codfish, salmon, herring), cyanobacteria and copepods (the latter may have too low counts). Cnidaria, Echinodermata, Mollusca also had potentially sufficiently high number of reads. Where counts are of sufficient abundance, consider also assessing the damage patterns for some of the main groups that were excluded to show the difference in damage.
Our response: We now assessed damage patterns for two benthic taxa (Asteria rubens, Pecten maximus), four eukaryotic algae (Chaetoceros simplex, Thalassiosira pseudonana, Fragilariopsis cylindrus, Bathycoccus prasinos), one prokaryotic algae (Synechococcus), three pelagic fish (Oncorhynchus kisutch, Salmo trutta and Gadus morhua) and for two taxa that we excluded from the data: 1. freshwater fish Cyprinus carpio (excluded because we assume a taxonomic mis-classification) and one dominant bacteria Marinobacter (excluded because we did not focus on bacterial composition). To apply damage pattern analysis to non-photosynthetic organisms, we used the nt/nr database instead of the cp database (which was used in the former manuscript version) in the automated HOPS pipeline (Huebler et al. 2019). The pipeline uses MaltExtract function to identify the ratio of C>T (5'end) and G>A (3'end) substitutions for a selection of DNA reads (pre-classified as ancient) that have one mismatching lesion in their first 5 bases from either end. Damage pattern profiles of preselected ancient reads are provided for three time periods 1.08-5.6 cal kyr BP (=dataset1), 6.3-12.6 cal kyr BP (=dataset2) and 13.6-19.9 cal kyr BP (=dataset3) and are placed in the Supplementary information (Supplementary Figure 6a-c). The profiles support an increase of damage pattern with increasing age of the samples. Also excluded taxa show the same trend. As we excluded taxa based on either taxonomic misclassification (like for Cyprinus) or because of non-focus taxa (like Marinobacter) we expect ancient DNA damage patterns for such taxa as well.
Revised text in the Supplementary Information: "We confirmed authenticity of pelagic and benthic focal taxa via damage pattern analysis using the automated HOPS pipeline 2 . The pipeline was run in full mode against the NCBI nt database (downloaded 03.11.2020) with options "filter" and "alignment" set to "ancient" and "1", respectively. The required index for the database was built using malt-build with default settings and the newest MEGAN6 3 mapping file (megan-nucl-map-Jul2020.db). The malt alignment of only merged reads against the full nt database was run for two benthic taxa (Asteria rubens, Pecten maximus), four eukaryotic algae (Chaetoceros simplex, Thalassiosira pseudonana, Fragilariopsis cylindrus, Bathycoccus prasinos), one prokaryotic algae (Synechococcus), three pelagic fish (Oncorhynchus kisutch, Salmo trutta and Gadus morhua) and for two taxa that we excluded from the data: 1. freshwater fish Cyprinus carpio (excluded because we assume a taxonomic mis-classification) and one dominant bacteria Marinobacter (excluded because we did not focus on non-phototrophic bacterial composition). The pipeline uses MaltExtract function to identify the ratio of C>T (5'end) and G>A (3'end) substitutions for a selection of DNA reads (pre-classified as ancient) that have one mismatching lesion in their first 5 bases from either end. Damage pattern profiles of pre-selected ancient reads are provided for three time periods 1.08-5. (1.08-5.6 cal kyr BP), b: dataset2 (6.3-12.6 cal kyr BP) and c: dataset3 (13.6-19.9 cal kyr BP). C>T substitutions in 5' direction and G>A substitutions in 3' are given for each taxon analyzed. Non-C>T substitutions in 5' direction and non-G>A substitutions in 3' direction are given to estimate the noise (gray color). The general color code for benthic taxa (Asterias rubens and Pecten maximus) is brownish, eukaryotic algae (euk algae) are greenish, pelagic fish are blueish and bacteria are colored purple. Overall, the profiles show increasing accumulation of damage over the three binned timeframes for pelagic and benthic families.

Reviewer Comment:
Because it is particularly prone to reference bias because of its high representation in NCBI as a model organism, to further validate the signal of Salmon (vs that of other fish), the authors could consider providing information on whether reads fall within unique parts of its genome (vs other fish). Presenting rarefaction curves for each sample and for both pelagic and benthic groups in supplementary would help assessing sampling completeness at the Family level.
Our response regarding Salmonidae: The strength of kraken2 addresses exactly this point. kraken2 connects short genomic substrings (minimizer, based on k-mers) with the lowest common ancestor (LCA) taxa. When 2 distinct k-mers from distantly related genomes share the same minimizer this can either lead to higher LCA values (in case both genomes are part of the database) or incorrect assignments (if one of the genomes is missing). However, the latter problem is stronger on species and genus level than on higher level classifications like family level, especially as our focal families are represented by at least one reference genome. We assessed the state of the database and tested the signal of Salmonidae versus that of other fish families (here the most important families of the Western Bering Sea food web: Gadidae, Clupeidae, Pleuronectidae). These families are covered by several reference genomes across several families (a more detailed version of the summary table presented here is available as a chapter in the Supplementary  Information, p.12), but almost all genera are represented in the database.
Novel chapter in the Supplementary Information: "The strength of kraken2 addresses exactly this point. kraken2 connects minimizers (short substrings used for binning of k-mers) with the lowest common ancestor (LCA) taxa. When 2 distinct k-mers from distantly related genomes share the same minimizer, this can either lead to higher LCA values (in case both genomes are part of the database) or incorrect assignments (if one of the genomes is missing). However, the latter problem is stronger on species and genus level than on higher level classifications like family level, especially as our focal families are represented by at least one reference genome. Our method is also less prone to bias by taxa with a high representation in datasets than a BLAST-based approach. BLAST returns the n-best hits, but a model species can dominate the list of hits and a more closely related species with a better hit might be missed if it comes further down in the database and the threshold of hits has been reached.
We assessed the state of the database and tested the signal of Salmonidae versus that of other fish families (here the most important families of the Western Bering Sea food web: Gadidae, Clupeidae, Pleuronectidae). These families are covered by several reference genomes across several genera (Supplementary Table 3). Furthermore, almost all the corresponding genera from our study area are represented in the database.
One important group that is present in our data, but missing in our correlation analyses, due to very few read counts is the family Pleuronectidae, which contains ground fish such as the Pacific halibut (Hippoglossus stenolepis; genome available), Arrowtooth flounder (Atheresthes stomas; no references in NCBI) and Greenland turbot (Reinhardtius hippoglossoides; genome available). The proportion of Pleuronectidae minimizers (0.12%) among Actinopteri is only slightly lower than the proportion of minimizers for Clupeidae (0.15%). Hence other factors than a reference database bias could be responsible for the lower number of read counts assigned to Pleuronectidae, such as lower past abundance compared to Salmonidae and Clupeidae. Salmonidae have many more unique minimizers in the database than the other fish families (5.21% of Actinopteri minimizers)." Table 3. Read counts for planktic, pelagic, and benthic taxonomic groups before resampling. Here only the summary of the table is provided.

Supplementary
Our response regarding rarefaction curves: We plotted the rarefaction curves and included them with a small description into section "2. Filtering of sequencing data" of the revised supplement. Added text in the Supplementary Information: "We checked completeness of pelagic and benthic taxa within the samples on family level with rarefaction curves. The rarefaction curves reflect that many taxa were cautiously filtered out to get stable signals over time (Supplementary Figure 3). For our purposes however, reaching a saturation of the curves is not essential, because we are not investigating presence/absence of taxa, but rather semi-quantitative trends in ecosystem composition." Supplementary Figure 3. Rarefaction curves for pelagic and benthic families before resampling. The vertical line indicates the lowest number of read counts for each of the datasets (pelagic = 6,593 counts; benthic = 1,839 counts) to which the two datasets were resampled, while horizontal lines mark the number of families detected for each sample after rarefaction 7. Clarity and context: Your view on the clarity and accessibility of the text, and whether the results have been provided with sufficient context and consideration of previous work. Note that we are not asking for you to comment on language issues such as spelling or grammatical mistakes.

MY COMMENTS:
The manuscript was a nice read, and I have made only a few suggestions that may improve clarity and context for the wider readership.
Our response: We thank the reviewer for the suggestion.For IP25 the record does not extend to 20 kyr BP, therefore we previously applied an interpolation method which uses a moving average, with the result that the first values could not be retrieved. We interpolated the variables SST and IP25 (Supplementary Table 3, Supplementary Fig. 10) to address this issue with a new method. Our extrapolation at the core base is based on the assumption of stable sea-ice conditions between 20 and 18 ka. This assumption is consistent with several proxies records from several cores across the Bering Sea and the Sea of Okhotsk which all show that winter sea ice coverage was stable during the time between 18 to 20 kyr BP 11,12,14 . We extrapolated a stable value of IP25 toward the base and are confident that this is the best knowledge based extrapolation method and also superior to deleting the entire samples from data analyses.
Revised text L 70-73"A regional palaeoceanographic framework for past sea-ice cover in the Bering Sea is based on reconstructions using the highly branched isoprenoid alkane IP25 12 which is produced by diatoms that are bound to a life in sea ice 10 and diatom microfossils 11 ." L 108-113: "The paleoenvironmental backbone is based on previous multi-proxy reconstructions of the coring site derived from this core, including diatom microfossils and the biomarker IP25, which show that the coring site was covered by seasonal sea-ice during the Last Glacial Maximum (LGM), Heinrich Stadial 1, and the Younger Dryas, while the phases of the Bølling-Allerød and the Holocene were predominantly sea-ice free 11,15 " Included citations: 10.
Belt, S. T. & Müller, J. The Arctic sea ice biomarker IP25: a review of current understanding, recommendations for future research and applications in palaeo sea ice reconstructions. Quaternary Science Reviews 79, 9-25 (2013).

12.
Méheust, M., Stein, R., Fahl, K. & Gersonde, R. Sea-ice variability in the subarctic North Pacific and adjacent Bering Sea during the past 25 ka: new insights from IP25 and
Revised text L 558-562: "SST 25 and IP25 41 data were interpolated to our sample ages using R function approx 95 . As for the last 2 samples IP25 data could not be extrapolated, we calculated the mean of the last three interpolated values and replaced the missing data based on regional sea ice reconstructions for the LGM 22 (Supplementary Fig. 10)." Novel chapter in the Supplementary Information: "SST and IP25 5 data were interpolated to our sample ages using R function approx 6 . As for the last 2 samples IP25 data could not be extrapolated, we calculated the mean of the last three interpolated values and replaced the missing data based on our knowledge about past sea ice coverage during the time for the study area from diatom microfossil-and biomarker-derived sea ice reconstructions for the LGM 7,8 ." Included citations: 6.
R Core Team. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2018).

8.
Matul, A. G. Probable limits of sea ice extent in the northwestern Subarctic Pacific during the last glacial maximum. Oceanology 57, 700-706 (2017). Supplementary Figure 10. Temporal evolution of IP25 and SSTs with interpolated and extrapolated values marked by red dots.

Supplementary
Reviewer Comment: 2. Currently, the analysis carried out suppose that each sample is completely independent from the others. In other words, it assumes that there are no temporal autocorrelations in the data. A test of temporal autocorrelation in the taxon data should be carried out to make sure the interpretation is not affected by temporal autocorrelation. If there is temporal autocorrelation in the data, it will need to be accounted for in the analysis and interpretation. If there is no temporal autocorrelation in the taxon data, it would be an interesting addition to the results that is valuable ecologically because it would refine the understanding gain from this system (e.g. a deeper understand of temporal turnover, or temporal beta diversity, could be assessed). There exist a few options to test for temporal autocorrelation for multivariate data, two that come to mind is to use the approach proposed by Legendre and Gauthier (2013) or Dray et al. (2008) (both proposed techniques have been implemented in the adespatial R package). Another approach is to test the temporal autocorrelation for each taxon independently, this could be done, e.g. with a Moran's I (Moran 1948) or an ARIMA model.

Our response:
We assessed temporal autocorrelation for each taxon and can conclude that overall, our data are not prone to temporal autocorrelation. We have provided an overview of the autocorrelation plots for the most important families in the supplement ( Supplementary  Figs 8 and 9). And added the following sentence in the first paragraph of the statistical analysis section. Revised text L 533-535: "We tested each of the pelagic and benthic taxa for temporal autocorrelation using acf from the stats package and conclude that our trends and correlation networks are not influenced by temporal autocorrelation (Supplementary Figs. 8 and 9)."